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Abstract — In content-based image retrieval, relevance 
feedback is studied extensively to narrow the gap between 
low-level image feature and high-level semantic concept. In 
general, relevance feedback aims to improve the retrieval 
performance by learning with user’s adjustment on the retrieval 
results. Despite widespread interest, feedback related 
technologies are often faced with a few limitations. One of the 
most obvious limitations is often requiring the user to repeat a 
number of steps before obtaining the improved search results. 
This makes the process inefficient and tedious search for the 
online applications. In this paper, an effective relevance 
feedback scheme for content-based image retrieval is proposed. 
First, a decision boundary is learned via Support Vector 
Machine to filter the images in the database. Then, a ranking 
function for selecting the most informative samples will be 
calculated by defining a novel criterion that considers both the 
scores of Support Vector Machine function and similarity metric 
between the " ideal query” and the images in the database. The 
experimental results on standard datasets have shown the 
effectiveness of the proposed method. 

Index Terms — Interactive image retrieval, Content-based 
image retrieval, Relevance feedback, SVM Active learning, 
Batch mode active learning. 

I. Introduction 

The rapid development of digital devices and the dominance 
of social networks have led to the great demand of sharing, 
browsing and searching images. Therefore, to satisfy such 
requirements, image retrieval systems have become an urge 
necessity. Basically, there are two main frameworks to form 
image retrieval systems: text-based and content-based 
systems [8]. In text-based image retrieval systems, the users’ 
queries are composed by key-words, which describe image 
content. The system retrieves images based on image labels 
which are annotated manually. However, the difficulties in 
annotating a massive number of images and avoiding 
subjectively labelling make this framework impractical. In 
order to overcome such hindrances, Content-Based Image 
Retrieval (CBIR) is known to be a more optimized approach 
which aims to bring image content closer to human 
understanding. 

In CBIR, low-level visual features, such as colors, textures, 
patterns, and shapes are used to describe image contents. 
These low-level features are automatically extracted to 
represent the images in the database without manual 
interventions. Its advantage over keyword based image 
retrieval lies in the fact that feature extraction can be 
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performed automatically and the image’s own content is 
always consistent. However, the most challenging problem in 
the CBIR systems is the semantic gap [9], [5], i.e., images of 
dissimilar semantic content may share some common 
low-level features, while images of similar semantic content 
may be scattered in the feature space. Despite the great deal of 
research work dedicated to the exploration of an ideal 
descriptor for image content [3], [1], [2], [11], [22], [4] its 
performance is far from satisfactory due to the fundamental 
difference between human understanding (high level 
concepts) and machine understanding (low level features). To 
narrow down the semantic gap, one possible solution is to 
integrate human interaction in the system, which is popularly 
known as Relevance Feedback (RF) [13], [15]. In general, RF 
aims to improve the retrieval performance by learning with 
user’s judgments on the retrieval results. In this way, the 
system needs to be run through several iterations. In each 
iteration, the CBIR system fist returns a short list of 
top-ranked images with respect to a user’s query by a regular 
retrieval approach based on Euclidean distance measure, and 
then some images are given to users, labeled by them as being 
relevant or irrelevant (positive or negative examples). Using 
these labeled images as seeds, machine learning techniques 
will be used to build a model to classify the database images 
into two classes: a class containing images that suppose to 
satisfy the users and the other class containing the irrelevant 
images. A typical scenario for a CBIR system with RF using 
machine learning [9] (represented in Figure 1) is as follows: 

1. User chooses the query image. Extracting low-level 
features of the query image . 

2. Returning result images. There are two cases: 

a) Initial phase: depends on the similarity measure of 
low-level features between query image’s features 
and database image’s features since we don’t have 
any training example to train machine learning 
classification. 

b) Result images in RF loops: Using the function of 
the classification as a ranking function 

3. User judges these initial result images as to whether and 
to what degree, they are relevant (positive examples)/ 
irrelevant (negative examples) to the query example. 
After judging, these images are labeled. 

4. Machine learning algorithm is applied to learn the user 
feedback using labeled examples obtained from the first 
to the current iteration. Then go back to Step 2. 

Note that in this scenario, Step 2, 3 and 4 are repeated until 
the user is satisfied with the results. 
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Figure 1 : The CBIR system with Relevance Feedback 


From a general machine learning view, RF is essentially a 
binary classification problem in which sample images 
provided by the user are employed to train a classifier, which 
is then used to classify the database into images that are 
relevant to the query and those that are not [8], [9]. However, 
RF is very different from the traditional classification 
problem because the feed backs provided by the user are often 
limited in real-world image retrieval systems. Therefore, 
small sample learning methods are the most promising for RF. 

Support Vector Machine (SVM) is one of the popular small 
sample learning methods widely used in recent years, which 
has a very good performance for pattern classification 
problems [12], [10], [20], [19]. Compared with other learning 
algorithms, SVM appears to be a good candidate for several 
reasons: generalization ability, without restrictive 

assumptions regarding the data, fast learning and evaluation 
for relevance feedback, flexibility, e.g., prior knowledge can 
be easily used to tune its kernels. However, for the 
SVM-based relevance feedback, the retrieval performance is 
actually worse when the number of labeled positive feedback 
samples is small. 

SVM active learning actively selects samples close to the 
boundary as the most informative samples for the user to label 
in each round of RF [16], [6], [14]. Although SVM 
active-based relevance feedback can work better than the 
conventional SVM-based relevance feedback, it has two 
major drawbacks: First, the performance of SVM is usually 
limited by the number of labeled examples. Second, since the 
batch of examples is selected all at once, the previously 
labeled examples will have no influence on the selection of 
the rest examples in the batch. Different strategies to solve 
this problem have been proposed, [7], [21], [18]. Hoi etal. [7] 
have been proposed the Semi-Supervised SVM Batch Mode 
Active Learning. This method first constructs a kernel 
function which is learned from a mixture of labelled and 
unlabelled examples. The kernel will then be used to 
effectively identify the most informative and diverse 
examples for active learning via a min-max framework. 
Zhang et al [21] have proposed a dynamic batch mode SVM 
active learning scheme, which dynamically selects a batch of 
examples one by one, using the label of the previously 
selected example to guide the selection of the next one. The 
selection of feedback examples is determined by both the 
existing classification boundary and previously labelled 
examples. In the solutions presented, the selection of 
examples for the user to label in each round of RF is solely 


determined by the existing SVM decision boundary. 
However, in early iterations, the SVM decision boundary 
might not be accurate due to the lack of training examples. In 
this case, the samples selected by the methods will not be 
those that should be selected, and it makes the subsequent 
learning inefficient. Consequently, a poor retrieval 
performance will result, even if several rounds of learning 
have been performed. 

To address the above problems, we propose a novel Batch 
Mode for SVM active learning. In proposed method, a 
decision boundary first is learned via SVM to filter the images 
in the database. Then, a ranking function will be constructed 
by defining a novel criterion that considers both the scores of 
SVM function and similarity measure between the query and 
the images in the database. This can effectively reduce the 
adverse effect of inaccurate decision boundary. By using the 
priority coefficient in the ranking function, we can select a 
batch of feedback examples which may be informative 
enough to improve the retrieval accuracy significantly. The 
experimental results on standard datasets have shown the 
effectiveness of the proposed method, especially when the 
number of initially labelled samples are small in early 
iterations. 

The rest of this paper is organized as follows. Section 2 
presents the basic theory about SVM-based RF. Section 3 
presents the problem formulation and our solution. The 
retrieval performance of the proposed method is presented in 
Section 4. Finally, we discuss future research directions and 
give the conclusions 

II. SVM-BASED RELEVANCE FEEDBACK 

SVM was first introduced by Vapnik et al. in [17] and 
until now is an active part of the machine learning research 
around the world. With strong theoretical foundations 
available, it is being used for many applications and is a 
popular and small sample learning method that has a very 
good performance for pattern classification problems. The 
key idea of SVM is given a set of labelled examples 

L = {(x 1? y y^)} , where x e R/ represents an 
image by a d-dimensional vector, and y. e{l,— 1} is the 
label, to find a hyperplane. 

f (x) = (w.x) + b (1) 

that achieves the best separation of two classes, provided that 
the empirical risk is minimized and the margin is maximized 
for the training vectors that are correctly classified. This is a 
quadratic programming problem. It is solved by finding W 
and b so as to minimize the function 

f PwP 1 + cfe (2) 

^ i = 1 

s.t. y. (w.x t + b) > 1 - > 0 , / = 1 . . . n. 

The corresponding dual form can be the following: Find 
the parameters OC i , / = 1 . . . n , which maximize the function 

n j n 

L(a) = Jjx, ~~ Ya,a ] y,yj K(x r Xj) (3) 

i = 1 Z iJ = 1 

n 

s.t. yy i a i = 0, 0„ a p , C,i = l...n, 

i = 1 
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where K (x ,x. ) is a kernel function. There are many kernel 

functions for nonlinear mapping. We choose to use the 
Gaussian radial basis function as the kernel function in our 
experiments 

-( x-y ) 2 


K(x, y) = exp CT , (4) 

where parameter G is the width of the Gaussian function. 
For a given kernel function, the SVM classifier is given by 


f i 


f{x) = sign 


Y^ i yi K ix r x j ) + b 


\i = 1 


(5) 


and the decision boundary is ^ ,_CC l y t K (x. .X. ) + b — 0 . 


In SVM-based CBIR relevance feedback, the decision 
boundary has been used to measure the relevance between a 
given pattern and the query image. In general, the examples 
have the large absolute values of SVM functions, the 
corresponding prediction confidence will be high. In a 
traditional method for relevance feedback, users judge on the 
top-ranked image examples, which have the largest values of 
the SVM function f (x) . This strategy is called Passive 


feedback. It tends to choose the most relevant examples. But 
they might not be the most informative examples for training 
SVM. Active learning method is proposed to deal with this 
problem. Active learning, known as pool-based active 
learning, is a subfield of machine learning and is one of the 
most promising methods currently available. Active learning 
tends to choose the most uncertain examples which are close 
to the decision boundary of SVM. 


III. BATCH MODE FOR SVM ACTIVE LEARNING 

In CBIR system, the RF can be formulated as an active 
learning problem, that the most informative unlabeled 
examples will be selected for improving the classification 
performance. Let L = { (x x , y x (x z , y x ) } denote the 
labeled image examples that are solicited through RF, and 
U = {x /+1 ,...,x /+M } the unlabeled image examples, where 

X e represents an image by a d-dimensional vector. Let 

S be a set of k unlabeled image examples to be selected in 
RF, and risk(f , S ,L, U) be a risk function that depends 
on the classifier / . In [7], selecting the most informative 
unlabeled examples for the RF is defined as finding the 
assignment vector S , which minimizes the risk function. 

S* = *arg min SQlJxlsl=k risk(f, S ,L, U) (6) 

The SVM-based active learning method selects the 
unlabeled example that is closest to the decision boundary. 
This can be expressed by the following optimization problem 

X* = *arg min xeV I f(x) I (7) 

For a query, after the boundary is learned based on the 
user’s feedback, the images in the database are filtered by the 
decision boundary. However, in early iterations, the SVM 
decision boundary might not be accurate due to the lack of 
training examples. Consequently, a poor retrieval 
performance will result. In this case, similarity measure of 
low-level features may be more reliable and can be used to 
restrict this problem. Therefore, we propose a method that can 


combine two scores of SVM function and similarity measure 
to form a unique ranking function. 

Let DS t denote the distance of the image i from the 
decision boundary given by SVM active learning, and 

DS (x, ) =1 / (x, ) 1=1 w.x t + b) I (8) 

where w and b denote the normal vector and the bias of 
the separating hyperplane, respectively, and X- is the feature 

vector representing the image i. Let DE t denote the 

Euclidean distance obtained between the image i with the 
"ideal query" image C , and 


DE(x.) = 


(9) 


fPx,.-x c P iff (x ; ) > 0 
[ co otherwise 
where x c = *arg max x eV DSj. The ranking function 


of our method for the i -th image can be defined as follows. 


DSE( Xi ) = 


A J, 


rel 


N +N 

rel ^ 1 V nonrel 


DS (x ; ) + 


(1 HsL )£>£( r.) (10) 

N +N 

rel nonrel 

where N rd is the total number of relevant images and 

N ^nrei tota l number of non-relevant images in each 

loop. We will choose the unlabeled examples, which have the 
smallest values of the ranking function DSE for the user to 
label. 

x* = *arg min x&v DSE(x) (11) 

The overall algorithm of batch mode for SVM active 
learning is briefly described in Algorithm 1 . 

IV. RESULT AND ANALYSIS 

To evaluate the performance of the proposed algorithm, we 
conduct an extensive set of CBIR experiments by comparing 
the proposed algorithm to several SVM feedback methods 
that have been used in image retrieval. The image database is 
a selected subset from Corel Gallery, which contains 10800 
images from about 80 different categories, autumn, aviation, 
bonsai, castle, cloud, dog, elephant, iceberg, primates, ship, 
tiger.... Each category consists of about 100 images and all the 
images are category-homogeneous. For feature representation 
in the experiment, we extract three types of features: color, 
texture and shape, which are used in [7]. 

Algorithm 1: Batch Mode for SVM Active Learning 

Input: 

L , U /* labeled and unlabeled data */ 

k, K /* batch size and an input kernel, e.g. an RBF 
kernel*/ 

Output: 

S /* a batch of unlabeled examples selected for 
labeling*/ 

Procedure: 

l. Train an SVM classifier: / = SVMTrain(L , K); 
/*call a standard SVM solver */ 
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2. Compute DS = (I f*(x l+1 ) I,. . .,1 f*(x n ) l) r ; 

3. Compute DE = (DE(x l+1 ), . . . , DE(x n ))\by Eq. 9 

4. S = (/) ; 

5. while I S l„ k do 


6 . 


for each x . 


Udo 


DSE(Xj) = 


N, 


rel 


N + N 

i V rel ^ 1 V nonrel 


DS(Xj) + 


( 1 - 


N, 


rel 


N +N 

rel nonrel 


)DE( X j) 


8 . 


end for 


9. x- = *arg min x cU Z)5£ , (x ); 

J j- J 

10. S t-Su{x‘); 

11. U^U, lx]}; 

12. end while 

13. return S . 


For color, we selected the color moments. Firstly, we 
convert the color space from RGB into HSV. Then, we extract 
3 moments: color mean, color variance and color skewness in 
each color channel, respectively. Thus, a 9 -dimensional color 
moment is used. 

For texture, a pyramidal wavelet transform (PWT) is 
performed on the gray images. Each wavelet decomposition 
on a gray 2 D-image results in four scaled-down sub-images. 
In total, 3 -level decomposition is conducted and features are 
extracted from 9 of the sub-images by computing entropy. 
Thus, a 9-dimensional wavelet vector is used. Thus, in total, a 
36-dimensional feature vector is used to represent each 
image. 

For shape, the edge direction histogram (EDH) is used as 
the shape features. The edge information contained in the 
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images is generated and processed using the Canny edge 
detection algorithm. The edge direction histogram is 
quantized into 1 8 bins of 20 degrees each, thus a total of 1 8 
edge features are extracted. 

All of these features are combined into a feature vector, 
which results in a vector with 36 values, we then normalize 
each feature to a normal distribution to eliminate the effect of 
different scales. The distance between pairs of images is 
computed as the Euclidean distance] . 

A. Comparative performance evaluation 

We performed a series of experiments to show the 
effectiveness of the proposed method and compare its 
performance with three state-of-the-art SVM feedback 
methods: SVM Active Learning [16], SVM Batch Mode 
Active Learning [7] and Dynamic Batch Sampling Mode 
[21]. To illustrate the actual situation of online users, 
randomly selected 20 images from the database are used to 
query, thus there will be 1600 query sessions. In the first step 
of each query session, the images in the database are ranked 
according to their Euclidean distances to the query. User’s 
relevance adjustments are simulated automatically in each 
loop, and top 15 images are used to label related or unrelated. 
The images in the same class are considered relevant and the 
rest are considered irrelevant. All images are labeled in the 
feedback loop that will be used for learning system. 

The retrieval results using the proposed algorithm without 
the relevance feedback are shown in Fig. 2(a). The image at 
the top of left-hand corner is the query image, the images are 
framed in red related to query image, the rest is non-relevance 
to query image. It’s easy to realize that the number of 
relevance images to the query image are very limited; there 
are so many images though the distance very close to the 
query image but very different semantics and vice versa. 
However, after four feedback loops, the number of relevant 
images of the proposed method have significantly improved 
as shown in Fig. 2(b). 
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Figure 2: The retrieval results using the proposed algorithm: (a) the result without the relevance feedback, (b) the result after 

four feedback iterations 
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Figure 3: Relationship between average AP and number of returned images: (a) the first feedback iteration, (b) the second 
feedback iteration, (c) the third feedback iteration, and (d) the fourth feedback iteration 
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Figure 4: Relationship between average AP and number of iterations: (a) the top 20 returned images, (b) the top 40 returned 
images, (c) the top 60 returned images, and (d) the top 80 returned images. 


We use the Average Precision (AP) measure as an 
evaluation measure, which defined by NISTTREC video 
(TRECVID). The AP value that can be obtained at each 
iteration is defined as the average of precision value obtained 
after each relevant picture is retrieved. The precision value is 
the ratio between the retrieved relevant pictures and the 
number of pictures currently retrieved. In fact, using the result 
for only one query is not reliable. In order to evaluate the 
performance of CBIR, we need to compute the retrieval 
results for various image examples, then use the average 
values of their results. Moreover, by varying N , the number 
of returned images, we can plot Mean Average Precision as a 
function of N with the number of result images fixed to 20, 
40, 60, 80 and 100. This experiment is to evaluate the efficient 


performance of all four methods in each case of user’s 
requirement. Several observations can be drawn from the 
results in Fig. 3 and Fig.4. First, we observe that retrieval 
performance of all the methods is improved after a number of 
rounds. This result indicates the importance of RF technique 
in CBIR system. Second, we observe that our proposed 
method tends to be more effective than the others in early 
iterations. That is expected because SVM performance is low 
when the number of training examples for classification is 
small; and ranking images mainly based on the similarity 
measure of low-level features is better. However, as the 
number of the feedback iteration increases, the number of 
training examples seems to be large enough to learn a good 
SVM, so the similarity measure is no longer necessary. These 
results again show the effectiveness of proposed for selecting 
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a batch of informative unlabeled examples for relevance 
feedback in CBIR. 

I. Conclusion 

In this paper, we have proposed a novel batch mode SVM 
active learning scheme for relevance feedback in CBIR. We 
choose a batch of feedback examples for the user to label 
using the combined ranking function instead of the SVM 
decision function used in traditional methods. Concretely, we 
combine two scores of SVM function and similarity measure 
to form a unique ranking function. With the help of combined 
ranking function, not only the adverse effect of inaccurate 
decision boundary due to lack of initially labelled samples can 
effectively be reduced, the retrieval performance can be 
further enhanced when there is sufficient number of initially 
labelled samples. The experimental results on a subset of 
COREL demonstrate the improvement by proposed scheme 
over the traditional schemes, especially when the number of 
initially labelled samples is small. As future developments of 
this work, we plan to extend the experimental on other 
datasets. 
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